• BINN
  • Welcome to the BINN documentation
  • Examples & tutorials
  • BINN - Biologically Informed Neural Network
  • Interpretation and plotting
  • Scikit-learn API
  • API Reference
  • BINN
  • Network
  • Explain
  • Importance Network & Visualization
  • Scikit-learn API
  • Published with MkDocs
  • Theme by GitBook

Interpretation and plotting¶

The sparse networks created using BINN can be interpreted using various post-hoc interpretation methods. We chose to utilize SHAP to explain which nodes are important for classifications made by the BINN.

Similar to in the BINN example, we load some example data, and generate the network and the BINN.

In [1]:
Copied!
from binn import Network, BINN
import pandas as pd

input_data = pd.read_csv("../data/test_qm.csv")
translation = pd.read_csv("../data/translation.tsv", sep="\t")
pathways = pd.read_csv("../data/pathways.tsv", sep="\t")

network = Network(
    input_data=input_data,
    pathways=pathways,
    mapping=translation,
)

binn = BINN(
    pathways=network,
    n_layers=4,
    dropout=0.2,
    validate=False,
    residual=True
)
from binn import Network, BINN import pandas as pd input_data = pd.read_csv("../data/test_qm.csv") translation = pd.read_csv("../data/translation.tsv", sep="\t") pathways = pd.read_csv("../data/pathways.tsv", sep="\t") network = Network( input_data=input_data, pathways=pathways, mapping=translation, ) binn = BINN( pathways=network, n_layers=4, dropout=0.2, validate=False, residual=True )
/home/erikh/BINN/BINN/test-venv/lib/python3.9/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

We load some test data and train the network.

In [2]:
Copied!
from util_for_examples import fit_data_matrix_to_network_input, generate_data
import torch
from pytorch_lightning import Trainer

design_matrix = pd.read_csv('../data/design_matrix.tsv', sep="\t")
protein_matrix = pd.read_csv('../data/test_qm.csv')

protein_matrix = fit_data_matrix_to_network_input(
    protein_matrix, features=network.inputs)

X, y = generate_data(protein_matrix, design_matrix=design_matrix)

dataloader = torch.utils.data.DataLoader(dataset=torch.utils.data.TensorDataset(torch.Tensor(X), torch.LongTensor(y)),
                                            batch_size=8,
                                            num_workers=12,
                                            shuffle=True)
trainer = Trainer(max_epochs=25)
trainer.fit(binn, dataloader)
from util_for_examples import fit_data_matrix_to_network_input, generate_data import torch from pytorch_lightning import Trainer design_matrix = pd.read_csv('../data/design_matrix.tsv', sep="\t") protein_matrix = pd.read_csv('../data/test_qm.csv') protein_matrix = fit_data_matrix_to_network_input( protein_matrix, features=network.inputs) X, y = generate_data(protein_matrix, design_matrix=design_matrix) dataloader = torch.utils.data.DataLoader(dataset=torch.utils.data.TensorDataset(torch.Tensor(X), torch.LongTensor(y)), batch_size=8, num_workers=12, shuffle=True) trainer = Trainer(max_epochs=25) trainer.fit(binn, dataloader)
GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
GPU available but not used. Set `accelerator` and `devices` using `Trainer(accelerator='gpu', devices=1)`.
You defined a `validation_step` but have no `val_dataloader`. Skipping val loop.

  | Name   | Type             | Params
--------------------------------------------
0 | layers | Sequential       | 365 K 
1 | loss   | CrossEntropyLoss | 0     
--------------------------------------------
365 K     Trainable params
0         Non-trainable params
365 K     Total params
1.464     Total estimated model params size (MB)
The number of training batches (25) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
Epoch 24: 100%|██████████| 25/25 [00:00<00:00, 56.33it/s, loss=0.532, v_num=28, train_loss=0.538, train_acc=0.853]
`Trainer.fit` stopped: `max_epochs=25` reached.
Epoch 24: 100%|██████████| 25/25 [00:00<00:00, 51.00it/s, loss=0.532, v_num=28, train_loss=0.538, train_acc=0.853]

Generate the BINNExplainer object.

In [3]:
Copied!
from binn import BINNExplainer

explainer = BINNExplainer(binn)
from binn import BINNExplainer explainer = BINNExplainer(binn)

Explain the network using SHAP. The value column in the resulting dataframe contains the SHAP values. Note that each source and target entity is now superceded by the layer index. This is so that we can keep track of copies of nodes.

In [4]:
Copied!
test_data = torch.Tensor(X)
background_data = torch.Tensor(X)

importance_df = explainer.explain(test_data, background_data)
importance_df.head()
test_data = torch.Tensor(X) background_data = torch.Tensor(X) importance_df = explainer.explain(test_data, background_data) importance_df.head()
Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior.
Out[4]:
source target value type source layer target layer
0 A0M8Q6_0 R-HSA-166663_1 0.000509 0 0 1
1 A0M8Q6_0 R-HSA-166663_1 0.000805 1 0 1
2 A0M8Q6_0 R-HSA-977606_1 0.000509 0 0 1
3 A0M8Q6_0 R-HSA-977606_1 0.000805 1 0 1
4 A0M8Q6_0 R-HSA-2029481_1 0.000509 0 0 1

The importance df can be used to create an importance network.

In [5]:
Copied!
from binn import ImportanceNetwork

IG = ImportanceNetwork(importance_df)
from binn import ImportanceNetwork IG = ImportanceNetwork(importance_df)
In [6]:
Copied!
IG.plot_complete_sankey(multiclass=False, savename='img/complete_sankey.png', node_cmap='Reds', edge_cmap='Blues')
IG.plot_complete_sankey(multiclass=False, savename='img/complete_sankey.png', node_cmap='Reds', edge_cmap='Blues')

alt text

The importance network can then be used to generate plots. Here we generate an upstream Sankey plot originating from the node 'P02766'.

In [7]:
Copied!
query_node = 'P02766'

IG.plot_subgraph_sankey(query_node, upstream=False, savename='img/subgraph_sankey.png', cmap='BuGn')
query_node = 'P02766' IG.plot_subgraph_sankey(query_node, upstream=False, savename='img/subgraph_sankey.png', cmap='BuGn')

alt text

We can also run the explainer for several iterations and compute the average importance.

In [9]:
Copied!
average_importance_df = explainer.explain_average(test_data = test_data, 
                                                  background_data = background_data, 
                                                  nr_iterations= 3, 
                                                  dataloader= dataloader,
                                                  max_epochs=5)
average_importance_df
average_importance_df = explainer.explain_average(test_data = test_data, background_data = background_data, nr_iterations= 3, dataloader= dataloader, max_epochs=5) average_importance_df
GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/home/erikh/BINN/BINN/test-venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:1764: PossibleUserWarning:

GPU available but not used. Set `accelerator` and `devices` using `Trainer(accelerator='gpu', devices=1)`.

/home/erikh/BINN/BINN/test-venv/lib/python3.9/site-packages/pytorch_lightning/trainer/configuration_validator.py:107: PossibleUserWarning:

You defined a `validation_step` but have no `val_dataloader`. Skipping val loop.


  | Name   | Type             | Params
--------------------------------------------
0 | layers | Sequential       | 365 K 
1 | loss   | CrossEntropyLoss | 0     
--------------------------------------------
365 K     Trainable params
0         Non-trainable params
365 K     Total params
1.464     Total estimated model params size (MB)
/home/erikh/BINN/BINN/test-venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:1892: PossibleUserWarning:

The number of training batches (25) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.

Epoch 4: 100%|██████████| 25/25 [00:00<00:00, 43.09it/s, loss=0.648, v_num=4, train_loss=0.645, train_acc=0.701]
`Trainer.fit` stopped: `max_epochs=5` reached.
Epoch 4: 100%|██████████| 25/25 [00:00<00:00, 39.16it/s, loss=0.648, v_num=4, train_loss=0.645, train_acc=0.701]
/home/erikh/BINN/BINN/test-venv/lib/python3.9/site-packages/torch/nn/modules/module.py:1053: UserWarning:

Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior.

GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/home/erikh/BINN/BINN/test-venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:1764: PossibleUserWarning:

GPU available but not used. Set `accelerator` and `devices` using `Trainer(accelerator='gpu', devices=1)`.

/home/erikh/BINN/BINN/test-venv/lib/python3.9/site-packages/pytorch_lightning/trainer/configuration_validator.py:107: PossibleUserWarning:

You defined a `validation_step` but have no `val_dataloader`. Skipping val loop.


  | Name   | Type             | Params
--------------------------------------------
0 | layers | Sequential       | 365 K 
1 | loss   | CrossEntropyLoss | 0     
--------------------------------------------
365 K     Trainable params
0         Non-trainable params
365 K     Total params
1.464     Total estimated model params size (MB)
/home/erikh/BINN/BINN/test-venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:1892: PossibleUserWarning:

The number of training batches (25) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.

Epoch 4: 100%|██████████| 25/25 [00:00<00:00, 36.35it/s, loss=0.626, v_num=5, train_loss=0.631, train_acc=0.721]
`Trainer.fit` stopped: `max_epochs=5` reached.
Epoch 4: 100%|██████████| 25/25 [00:00<00:00, 32.51it/s, loss=0.626, v_num=5, train_loss=0.631, train_acc=0.721]
/home/erikh/BINN/BINN/test-venv/lib/python3.9/site-packages/torch/nn/modules/module.py:1053: UserWarning:

Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior.

GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/home/erikh/BINN/BINN/test-venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:1764: PossibleUserWarning:

GPU available but not used. Set `accelerator` and `devices` using `Trainer(accelerator='gpu', devices=1)`.

/home/erikh/BINN/BINN/test-venv/lib/python3.9/site-packages/pytorch_lightning/trainer/configuration_validator.py:107: PossibleUserWarning:

You defined a `validation_step` but have no `val_dataloader`. Skipping val loop.


  | Name   | Type             | Params
--------------------------------------------
0 | layers | Sequential       | 365 K 
1 | loss   | CrossEntropyLoss | 0     
--------------------------------------------
365 K     Trainable params
0         Non-trainable params
365 K     Total params
1.464     Total estimated model params size (MB)
/home/erikh/BINN/BINN/test-venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:1892: PossibleUserWarning:

The number of training batches (25) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.

Epoch 4: 100%|██████████| 25/25 [00:00<00:00, 46.94it/s, loss=0.62, v_num=6, train_loss=0.620, train_acc=0.772] 
`Trainer.fit` stopped: `max_epochs=5` reached.
Epoch 4: 100%|██████████| 25/25 [00:00<00:00, 42.87it/s, loss=0.62, v_num=6, train_loss=0.620, train_acc=0.772]
/home/erikh/BINN/BINN/test-venv/lib/python3.9/site-packages/torch/nn/modules/module.py:1053: UserWarning:

Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior.

Out[9]:
source target type source layer target layer value_0 value_1 value_2 value_mean values_std value
0 A0M8Q6_0 R-HSA-166663_1 0 0 1 0.001052 0.000747 0.000431 0.000743 0.000253 0.000743
1 A0M8Q6_0 R-HSA-166663_1 1 0 1 0.000491 0.003720 0.006589 0.003600 0.002491 0.003600
2 A0M8Q6_0 R-HSA-977606_1 0 0 1 0.001052 0.000747 0.000431 0.000743 0.000253 0.000743
3 A0M8Q6_0 R-HSA-977606_1 1 0 1 0.000491 0.003720 0.006589 0.003600 0.002491 0.003600
4 A0M8Q6_0 R-HSA-2029481_1 0 0 1 0.001052 0.000747 0.000431 0.000743 0.000253 0.000743
... ... ... ... ... ... ... ... ... ... ... ...
6901 R-HSA-8953897_4 root_5 1 4 5 0.005240 0.000919 0.000724 0.002294 0.002084 0.002294
6902 R-HSA-1474244_4 root_5 0 4 5 0.008656 0.005115 0.008671 0.007481 0.001673 0.007481
6903 R-HSA-1474244_4 root_5 1 4 5 0.009226 0.005945 0.007662 0.007611 0.001340 0.007611
6904 R-HSA-1430728_4 root_5 0 4 5 0.004243 0.004448 0.003284 0.003991 0.000507 0.003991
6905 R-HSA-1430728_4 root_5 1 4 5 0.006746 0.004260 0.000137 0.003714 0.002726 0.003714

6906 rows × 11 columns